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Ci \ Abstract 

Ch ' Business optimization is becoming increasingly important because all business activities aim 

to maximize the profit and performance of products and services, under limited resources and 

I ' appropriate constraints. Recent developments in support vector machine and metaheuristics 

^ I show many advantages of these techniques. In particular, particle swarm optimization is now 

r*^ . widely used in solving tough optimization problems. In this paper, we use a combination of a 

f"~-. ' recently developed Accelerated PSO and a nonlinear support vector machine to form a frame- 

l/^ , work for solving business optimization problems. We first apply the proposed APSO-SVM to 

^O ' production optimization, and then use it for income prediction and project scheduling. We also 

f^ ' carry out some parametric studies and discuss the advantages of the proposed metaheuristic 

(3 ! SVM. 
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1 Introduction 

Many business activities often have to deal with large, complex databases. This is partly driven by 
information technology, especially the Internet, and partly driven by the need to extract meaningful 
knowledge by data mining. To extract useful information among a huge amount of data requires 
efficient tools for processing vast data sets. This is equivalent to trying to find an optimal solution 
to a highly nonlinear problem with multiple, complex constraints, which is a challenging task. 
Various techniques for such data mining and optimization have been developed over the past few 
decades. Among these techniques, support vector machine is one of the best techniques for regression, 
classification and data mining [5, 9, 16, 19, 20, 24]. 



On the other hand, metaheuristic algorithms also become powerful for solving tough nonlinear 
optimization problems [1, 7, 8, 27, 32]. Modern metaheuristic algorithms have been developed 
with an aim to carry out global search, typical examples are genetic algorithms [6] , particle swarm 
optimisation (PSO) [7], and Cuckoo Search [29, 30]. The efficiency of metaheuristic algorithms can 
be attributed to the fact that they imitate the best features in nature, especially the selection of the 
fittest in biological systems which have evolved by natural selection over millions of years. Since most 
data have noise or associated randomness, most these algorithms cannot be used directly. In this 
case, some form of averaging or reformulation of the problem often helps. Even so, most algorithms 
become difficult to implement for such type of optimization. 

In addition to the above challenges, business optimization often concerns with a large amount 
but often incomplete data, evolving dynamically over time. Certain tasks cannot start before other 
required tasks are completed, such complex scheduling is often NP-hard and no universally efficient 
tool exists. Recent trends indicate that metahcuristics can be very promising, in combination with 
other tools such as neural networks and support vector machines [5, 9, 15, 21]. 

In this paper, we intend to present a simple framework of business optimization using a combi- 
nation of support vector machine with accelerated PSO. The paper is organized as follows: We first 
will briefly review particle swarm optimization and accelerated PSO, and then introduce the basics 
of support vector machines (SVM). We then use three case studies to test the proposed framework. 
Finally, we discussion its implications and possible extension for further research. 

2 Accelerated Particle Swarm Optimization 
2.1 PSO 

Particle swarm optimization (PSO) was developed by Kennedy and Eberhart in 1995 [7, 8], based on 
the swarm behaviour such as fish and bird schooling in nature. Since then, PSO has generated much 
wider interests, and forms an exciting, ever-expanding research subject, called swarm intelligence. 
PSO has been applied to almost every area in optimization, computational intelligence, and de- 
sign/scheduling applications. There are at least two dozens of PSO variants, and hybrid algorithms 
by combining PSO with other existing algorithms are also increasingly popular. 

PSO searches the space of an objective function by adjusting the trajectories of individual agents, 
called particles, as the piecewise paths formed by positional vectors in a quasi-stochastic manner. 
The movement of a swarming particle consists of two major components: a stochastic component 
and a deterministic component. Each particle is attracted toward the position of the current global 
best g* and its own best location x* in history, while at the same time it has a tendency to move 
randomly. 

Let Xi and v^ be the position vector and velocity for particle i, respectively. The new velocity 
vector is determined by the following formula 

v*+i = V* + aei [g* - X*] + /3e2[x* - x*]. (1) 

where ei and ti are two random vectors, and each entry taking the values between and 1. The 
parameters a and /3 are the learning parameters or acceleration constants, which can typically be 
taken as, say, a w /3 « 2. 

There are many variants which extend the standard PSO algorithm, and the most noticeable 
improvement is probably to use an inertia function B(t) so that v* is replaced by 6'(t)v* 

v*+i = 0v* + aei[g* - X*] + /3e2[x* - x*], (2) 

where d G (0, 1) [2, 3]. In the simplest case, the inertia function can be taken as a constant, typically 
Q K, 0.5 ~ 0.9. This is equivalent to introducing a virtual mass to stabilize the motion of the particles, 
and thus the algorithm is expected to converge more quickly. 



2.2 Accelerated PSO 

The standard particle swarm optimization uses both the current global best g* and the individual 
best X* . The reason of using the individual best is primarily to increase the diversity in the quality 
solutions, however, this diversity can be simulated using some randomness. Subsequently, there is 
no compelling reason for using the individual best, unless the optimization problem of interest is 
highly nonlinear and multimodal. 

A simplified version which could accelerate the convergence of the algorithm is to use the global 
best only. Thus, in the accelerated particle swarm optimization (APSO) [27, 32], the velocity vector 
is generated by a simpler formula 

v*+i=v*+«e„ + /?(g*-x*), (3) 

where e„ is drawn from A^(0, 1) to replace the second term. The update of the position is simply 



x*+i=x*+v*+i. 



(4) 



In order to increase the convergence even further, we can also write the update of the location in a 
single step 

x*+i-(l-/3)x*+/3g*+ae„. (5) 

This simpler version will give the same order of convergence. Typically, a — O.IL ^ 0.5L where L is 
the scale of each variable, while /? — 0.1 ^ 0.7 is sufficient for most applications. It is worth pointing 
out that velocity does not appear in equation (5), and there is no need to deal with initialization 
of velocity vectors. Therefore, APSO is much simpler. Comparing with many PSO variants, APSO 
uses only two parameters, and the mechanism is simple to understand. 

A further improvement to the accelerated PSO is to reduce the randomness as iterations proceed. 
This means that we can use a monotonically decreasing function such as 

a = age '''*, (6) 

or 

a^aoT*, (0<7<1), (7) 

where ao ~ 0.5 ^ 1 is the initial value of the randomness parameter. Here t is the number of 
iterations or time steps. < 7 < 1 is a control parameter [32]. For example, in our implementation, 
we will use 

a = 0.7*, (8) 

where f G [0, tmax] and tmax is the maximum of iterations. 

3 Support Vector Machine 

Support vector machine (SVM) is an efficient tool for data mining and classification [25, 26]. Due to 
the vast volumes of data in business, especially e-commerce, efficient use of data mining techniques 
becomes a necessity. In fact, SVM can also be considered as an optimization tool, as its objective 
is to maximize the separation margins between data sets. The proper combination of SVM with 
mctaheuristics could be advantageous. 

3.1 Support Vector Machine 

A support vector machine essentially transforms a set of data into a significantly higher-dimensional 
space by nonlinear transformations so that regression and data fitting can be carried out in this 
high-dimensional space. This methodology can be used for data classification, pattern recognition, 
and regression, and its theory was based on statistical machine learning theory [21, 24, 25]. 

For classifications with the learning examples or data {xi,yi) where i ~ l,2,...,n and yi G 
{ — 1, +1}, the aim of the learning is to find a function (/)a(x) from allowable functions {i/jq, : a G 17} 



such that 0Q(xi) M> jji for (i — 1,2, ...,n) and that the expected risk E[a) is minimaL That is the 
minimization of the risk 

E{a)^^J\M^)-y\dQ{^,y), (9) 

where Q{x,y) is an unknown probability distribution, which makes it impossible to calculate E{a) 
directly. A simple approach is to use the so-called empirical risk 

1 " 

Ep{a)^—Y,\M^^)~y^\■ (10) 

i=l 

However, the main flaw of this approach is that a small risk or error on the training set does not 
necessarily guarantee a small error on prediction if the number n of training data is small [26] . 
For a given probability of at least 1 — p, the Vapnik bound for the errors can be written as 

E(a)<R,(a) + ^(l^,'^, (11) 

where 



Here /i is a parameter, often referred to as the Vapnik-Chervonenskis dimension or simply VC- 
dimension [24] , which describes the capacity for prediction of the function set (/)q, . 

In essence, a linear support vector machine is to construct two hyperplanes as far away as 
possible and no samples should be between these two planes. Mathematically, this is equivalent to 
two equations 

w-x + b = ±l, (13) 

and a main objective of constructing these two hyperplanes is to maximize the distance (between 
the two planes) 

||w|| 

Such maximization of d is equivalent to the minimization of ||w|| or more conveniently ||w||^. From 
the optimization point of view, the maximization of margins can be written as 

minimize — [[■w||^ = — (w • w). (15) 

This essentially becomes an optimization problem 

1 " 

minimize ^ = —||w|| +A> rji, (16) 

1=1 

subject to 2/i(w ■ Xi + b) > 1 - ryi, (17) 

r/, >0, (i = l,2,...,n), (18) 

where A > is a parameter to be chosen appropriately. Here, the term X]i=i Vi is essentially a 
measure of the upper bound of the number of misclassifications on the training data. 

3.2 Nonlinear SVM and Kernel Tricks 

The so-called kernel trick is an important technique, transforming data dimensions while simpli- 
fying computation. By using Lagrange multipliers ai > 0, we can rewrite the above constrained 
optimization into an unconstrained version, and we have 

_. n n 

L = -||w||2 + A^77,-^a,[y,(wx, + b)-(l-r;,)]. (19) 

i=l i=l 



From this, we can write the Karush-Kuhn- Tucker conditions 



dL " 

^— = w - ^ aij/^Xi =: 0, (20) 
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= ^ ]yiaiXi. 







(21) 

(22) 

,n), (23) 

(24) 



(25) 

It is worth pointing out here that only the nonzero a^ contribute to overah solution. This comes 
from the KKT condition (23), which implies that when a^ ^ 0, the inequality (17) must be satisfied 
exactly, while ao = means the inequality is automatically met. In this latter case, rji = 0. 
Therefore, only the corresponding training data {xi,yi) with a^ > can contribute to the solution, 
and thus such x^ form the support vectors (hence, the name support vector machine). All the other 
data with ai — become irrelevant. 

It has been shown that the solution for a^ can be found by solving the following quadratic 
programming [24, 26] 

n 1 ^ 

maximize X! "« ~ o ^ aiajViyji^i -x^), (26) 

i—1 ^J — 1 

subject to 

n 

Y,(^^y^^^^ 0<a, <A, (« = l,2,...,n). (27) 

i=l 

From the coefficients ai, we can write the final classification or decision function as 

n 

/(x)-sgn[^a,y,(x-x,) + b], (28) 

i=l 

where sgn is the classic sign function. 

As most problems are nonlinear in business applications, and the above linear SVM cannot be 
used. Ideally, we should find some nonlinear transformation (j) so that the data can be mapped onto a 
high-dimensional space where the classification becomes linear. The transformation should be chosen 
in a certain way so that their dot product leads to a kernel-style function K{x., x^) = '/'(x) • (j){'x.i). In 
fact, we do not need to know such transformations, we can directly use the kernel functions -?i'(x, x^) 
to complete this task. This is the so-called kernel function trick. Now the main task is to chose a 
suitable kernel function for a given, specific problem. 

For most problems in nonlinear support vector machines, we can use iir(x, x^) — (x • Xi^ for 
polynomial classifiers, K{-x., x^) = tanh[fc(x-Xi) + 0)] for neural networks, and by far the most widely 
used kernel is the Gaussian radial basis function (RBF) 

i^(x,x,)-exp[-%-^l -cxp[-7||x-x,||2l, (29) 

L (2cr-^) J L J 

for the nonlinear classifiers. This kernel can easily be extended to any high dimensions. Here a^ is 
the variance and 7 = 1/2ct^ is a constant. In general, a simple bound of < 7 < C is used, and 
here C is a constant. 



Following the similar procedure as discussed earlier for linear SVM, we can obtain the coefficients 
tti by solving the following optimization problem 



^ Qfj - -a^ajyiyjKixi, Xj). (30) 



maximize ^ 

^^ 2 

i=l 

It is worth pointing out under Mercer's conditions for the kernel function, the matrix A = yiyjK('x.i,Xj) 
is a symmetric positive definite matrix [26] , which implies that the above maximization is a quadratic 
programming problem, and can thus be solved efficiently by standard QP techniques [21]. 

4 Metaheuristic Support Vector Machine with APSO 

4.1 Metaheuristics 

There are many metaheuristic algorithms for optimization and most these algorithms are inspired 
by nature [27]. Metaheuristic algorithms such as genetic algorithms and simulated annealing are 
widely used, almost routinely, in many applications, while relatively new algorithms such as particle 
swarm optimization [7], firefly algorithm and cuckoo search are becoming more and more popular 
[27, 32]. Hybridization of these algorithms with existing algorithms are also emerging. 

The advantage of such a combination is to use a balanced tradeoff between global search which 
is often slow and a fast local search. Such a balance is important, as highlighted by the analysis by 
Blum and Roll [1]. Another advantage of this method is that we can use any algorithms we like at 
different stages of the search or even at different stage of iterations. This makes it easy to combine 
the advantages of various algorithms so as to produce better results. 

Others have attempted to carry out parameter optimization associated with neural networks and 
SVM. For example, Liu et al. have used SVM optimized by PSO for tax forecasting [13]. Lu et al. 
proposed a model for finding optimal parameters in SVM by PSO optimization [14]. However, here 
we intend to propose a generic framework for combining efficient APSO with SVM, which can be 
extended to other algorithms such as firefly algorithm [28, 31]. 

4.2 APSO-SVM 

Support vector machine has a major advantage, that is, it is less likely to overfit, compared with 
other methods such as regression and neural networks. In addition, efficient quadratic programming 
can be used for training support vector machines. However, when there is noise in the data, such 
algorithms are not quite suitable. In this case, the learning or training to estimate the parameters 
in the SVM becomes difficult or inefficient. 

Another issue is that the choice of the values of kernel parameters C and <t^ in the kernel 
functions; however, there is no agreed guideline on how to choose them, though the choice of their 
values should make the SVM as efficiently as possible. This itself is essentially an optimization 
problem. 

Taking this idea further, we first use an educated guess set of values and use the metaheuristic 
algorithms such as accelerated PSO or cuckoo search to find the best kernel parameters such as C 
and a^ [27, 29]. Then, we used these parameters to construct the support vector machines which 
are then used for solving the problem of interest. During the iterations and optimization, we can 
also modify kernel parameters and evolve the SVM accordingly. This framework can be called a 
metaheuristic support vector machine. Schematically, this Accelerated PSO-SVM can be represented 
as shown in Fig. 1. 

For the optimization of parameters and business applications discussed below, APSO is used for 
both local and global search [27, 32]. 



begin 

Define the objective; 
Choose kernel functions; 
Initialize various parameters; 
while (criterion) 

Find optimal kernel parameters by APSO; 

Construct the support vector machine; 

Search for the optimal solution by APSO-SVM; 

Increase the iteration counter; 
end 

Post-processing the results; 
end 

Figure 1: Mctahcuristic APSO-SVM. 



5 Business Optimization Benchmarks 

Using the franiework discussed earlier, we can easily implement it in any programming language, 
though we have implemented using Matlab. We have validated our implementation using the stan- 
dard test functions, which confirms the correctness of the implementation. Now we apply it to carry 
out case studies with known analytical solution or the known optimal solutions. The Cobb-Douglas 
production optimization has an analytical solution which can be used for comparison, while the 
second case is a standard benchmark in resource-constrained project scheduling [11]. 

5.1 Production Optimization 

Let us first use the proposed approach to study the classical Cobb-Douglas production optimization. 
For a production of a series of products and the labour costs, the utility function can be written 

n 

9 =11"? =<'<'•••<"' (31) 

where all exponents aj are non-negative, satisfying 

n 

E"J- = 1- (32) 

The optimization is the minimization of the utility 

minimize q (33) 

n 

subject to yjwjUj — K, (34) 

where Wj(j = 1, 2, ..., n) are known weights. 

This problem can be solved using the Lagrange multiplier method as an unconstrained problem 

n n 

^=n<''+^(E^^"j--^)' (35) 

whose optimality conditions are 

— - = ajuj^ Yi "? + ^'^3 = 0' iJ = 1' 2, ..., n), (36) 



du 



j=i 
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'^ WjUj - K = 0. (37) 



The solutions are 



K wiUj 

where (j = 2, 3, ..., n). For example, in a special case oi n — 2, ai — 2/3, 0:2 = 1/3, wi = 5, W2 = 2 
and K = 300, we have 

Q Ka2 

Ml = 7— ^ r = 40, M2 = 7— ^ T = 50. 

As most real-world problem has some uncertainty, we can now add some noise to the above 
problem. For simplicity, we just modify the constraint as 

n 

Y,WjU^=K{l + Pe), (39) 

where e is a random number drawn from a Gaussian distribution with a zero mean and a unity 
variance, and < /3 <C 1 is a small positive number. 

We now solve this problem as an optimization problem by the proposed APSO-SVM. In the 
case of /3 = 0.01, the results have been summarized in Table 1 where the values are provided with 
different problem size n with different numbers of iterations. We can see that the results converge 
at the optimal solution very quickly. 

Table 1: Mean deviations from the optimal solutions. 



size n 


Iterations 


deviations 


10 


1000 


0.014 


20 


5000 


0.037 


50 


5000 


0.040 


50 


15000 


0.009 



6 Income Prediction 

Studies to improve the accuracy of classifications are extensive. For example, Kohavi proposed a 
decision-tree hybrid in 1996 [10]. Furthermore, an efficient training algorithm for support vector 
machines was proposed by Piatt in 1998 [17, 18], and it has some significant impact on machine 
learning, regression and data mining. 

A well-known benchmark for classification and regression is the income prediction using the data 
sets from a selected 14 attributes of a household from a sensus form [10, 17]. Wc use the same 
data sets at ftp://ftp.ics.uci.edu/pub/machine-learning-databases/adult for this case study. There 
are 32561 samples in the training set with 16281 for testing. The aim is to predict if an individual's 
income is above or below 50K ? 

Among the 14 attributes, a subset can be selected, and a subset such as age, education level, 
occupation, gender and working hours are commonly used. 

Using the proposed APSO-SVM and choosing the limit value of C as 1.25, the best error of 
17.23% is obtained (see Table 2), which is comparable with most accurate predictions reported in 
[10, 17]. 

6.1 Project Scheduling 

Scheduling is an important class of discrete optimization with a wider range of applications in 
business intelligence. For resource-constrained project scheduling problems, there exists a standard 



Tabic 2: Income prediction using APSO-SVM. 



Train set (size) 


Prediction set 


Errors (%) 


512 

1024 

16400 


256 
256 
8200 


24.9 
20.4 
17.23 



benchmark library by Kolisch and Sprecher [11, 12]. The basic model consists of J activities/tasks, 
and some activities cannot start before all its predecessors h are completed. In addition, each activity 
j = 1, 2, ..., J can be carried out, without interruption, in one of the Mj modes, and performing any 
activity j in any chosen mode m takes djm periods, which is supported by a set of renewable resource 
R and non-renewable resources A^. The project's makespan or upper bound is T, and the overall 
capacity of non-renewable resources is K!^ where r G N. For an activity j scheduled in mode 
TO, it uses fc^^^ units of renewable resources and k^^^^ units of non-renewable resources in period 
t=l,2,...,T. 

For activity j, the shortest duration is fit into the time windows [EFj, LFj] where EFj is the 
earliest finish times, and LFj is the latest finish times. Mathematically, this model can be written 
as [11] 

Mi LFi 



Minimize ^ (x) > > t ■ Xjmt , 



m=l t=EFi 



(40) 



subject to 



Mi LFi 



Mh LF, 

^ ^ tXhmt<Y^ ^ {t-djm)Xjmt,{j ^2,...,J), 



m- 


= 1 t=EFj 




m=l t=EFj 
niin{t+dj,n-l,LFj} 












>;>; 




/ , Xjmq 




(r 


e 


R) 




j — l m—l 




ij=max{t,E_F,} 











LFi 



J M, 
j=l m=l t=EFj 



(41) 



and 



Mi 



^^i = SF/^^=l, J = 1,2,..., J, (42) 

where Xjmt G {0, 1} and i = 1, ...,T. As Xjmt only takes two values or 1, this problem can be 
considered as a classification problem, and metaheuristic support vector machine can be applied 
naturally. 



Table 3: Kernel parameters used in SVM. 



Number of iterations 



SVM kernel parameters 



C = 149.2, a^ = 67.9 
C= 127.9, cr2 ^64.0 



1000 
5000 



Using the online benchmark library [12], we have solved this type of problem with J = 30 
activities (the standard test set j30). The run time on a modern desktop computer is about 2.2 
seconds for N — 1000 iterations to 15.4 seconds for N ~ 5000 iterations. We have run the simulations 
for 50 times so as to obtain meaningful statistics. 

The optimal kernel parameters found for the support vector machines are listed in Table 3, while 
the deviations from the known best solution are given in Table 4 where the results by other methods 
are also compared. 



Table 4: Mean deviations from the optimal solution (J=30). 



Algorithm 


Authors 


N = 1000 5000 


PSO [22] 


Kemmoe et al. (2007) 


0.26 0.21 


hybribd GA [23] 


Vails eta al. (2007) 


0.27 0.06 


Tabu search [15] 


Nonobe & Ibaraki (2002) 


0.46 0.16 


Adapting GA [4] 


Hartmann (2002) 


0.38 0.22 


Meta APSO-SVM 


this paper 


0.19 0.025 



From these tables, we can see that the proposed metaheuristic support vector machine starts 
very well, and results are comparable with those by other methods such as hybrid genetic algorithm. 
In addition, it converges more quickly, as the number of iterations increases. With the same amount 
of function evaluations involved, much better results are obtained, which implies that APSO is very 
efficient, and subsequently the APSO-SVM is also efficient in this context. In addition, this also 
suggests that this proposed framework is appropriate for automatically choosing the right parameters 
for SVM and solving nonlinear optimization problems. 

7 Conclusions 

Both PSO and support vector machines are now widely used as optimization techniques in business 
intelligence. They can also be used for data mining to extract useful information efficiently. SVM can 
also be considered as an optimization technique in many applications including business optimization. 
When there is noise in data, some averaging or reformulation may lead to better performance. In 
addition, metaheuristic algorithms can be used to find the optimal kernel parameters for a support 
vector machine and also to search for the optimal solutions. We have used three very different case 
studies to demonstrate such a metaheuristic SVM framework works. 

Automatic parameter tuning and efficiency improvement will be an important topic for further 
research. It can be expected that this framework can be used for other applications. Furthermore, 
APSO can also be used to combine with other algorithms such as neutral networks to produce more 
efficient algorithms [13, 14]. More studies in this area are highly needed. 
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